Redesign live-to-final assistant replies#3401
Conversation
d33dbce to
a2bf57a
Compare
c96e741 to
a000b20
Compare
b62fd31 to
85830b7
Compare
|
This is a large slice (57 files, ~6k lines), so I focused on the one part with real runtime-behavior risk: the rewritten "missing final assistant reply" detection in The
|
The provider/model reasoning-effort coercion (coerce_reasoning_effort_for_model, _filter_reasoning_efforts_for_provider, and their call-site wrappers) is unrelated to the live-to-final assistant reply experience and changes behavior for all reasoning-capable models. Reverting it here keeps PR nesquena#3401 focused on live stream / worklog / auto-compression / stream-ownership. The StreamChannel snapshot/event-id changes in config.py are part of the live-stream replay work and intentionally remain. The coercion ships separately so it gets a provider-capability-focused review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
| Filename | Overview |
|---|---|
| api/config.py | StreamChannel gains subscribe_with_snapshot, note_last_event_id, and 3-tuple put_nowait; _last_event_id is now maintained atomically under _lock; well-structured and tested. |
| api/routes.py | Replay dedup logic (snapshot_cutoff_seq / replay_cutoff_seq) is correct for the common case; _start_chat_stream_for_session loop correctly re-checks ownership under lock. |
| api/streaming.py | Adds 3-tuple event queueing with note_last_event_id, post-compression tool-result pruning, and updated compression message strings; changes are focused and low-risk. |
| api/run_journal.py | Adds max_seq parameter to read_run_events for upper-bound filtering; minimal, correct change. |
| static/messages.js | _completeAutomaticCompressionOnLiveProgress now defined; activityBurstAnchor / segmentSeq tracking added to INFLIGHT; _hashString allocates String(value |
| static/sessions.js | loadSession INFLIGHT deletion for journalReplayFromStart/stale cursor entries is correct; _loadingSessionId guard fix allows same-session reload during pending switches; commented-out line left in the non-INFLIGHT active-stream branch. |
| static/ui.js | Large addition of worklog/live-run-status helpers; ensureLiveWorklogShell, showLiveRunStatus, _moveLiveRunStatusToTurnEnd all well-guarded with typeof checks; _stripVisibleAssistantEchoFromThinking semantics narrowed to exact-match only. |
| static/style.css | Large style rework for live-worklog, compression divider, run-status footer; muted typography for lifecycle chrome vs assistant prose. |
| tests/test_inflight_stream_reuse.py | New static-analysis tests covering same-stream reuse, CONNECTING transport rejection on reconnect, and same-session no-op guard fix. |
| tests/test_auto_compression_card.py | Tests for _completeAutomaticCompressionOnLiveProgress definition and per-event-listener call, and elapsed-timer no-op after PR changes. |
Sequence Diagram
sequenceDiagram
participant FE as Frontend (sessions.js / messages.js)
participant SC as StreamChannel (config.py)
participant RJ as RunJournal (run_journal.py)
participant SSE as SSE Handler (routes.py)
participant Worker as Streaming Worker (streaming.py)
Note over FE,Worker: Live turn in progress
Worker->>SC: note_last_event_id(event_id)
Worker->>SC: put_nowait((event, data, event_id))
SC->>SC: "_last_event_id = event_id"
SC-->>FE: broadcast 3-tuple to active subscribers
Note over FE,Worker: User switches session — SSE disconnects
SC->>SC: Buffer items in _offline_buffer
Note over FE,Worker: User switches back — reconnect
FE->>SSE: "GET /api/chat/stream?replay=1&after_seq=N&after_event_id=X"
SSE->>SC: subscribe_with_snapshot()
SC-->>SSE: "(subscriber_queue, {last_event_id: Y})"
SSE->>SSE: "snapshot_cutoff_seq = parse(Y)"
SSE->>RJ: "_replay_run_journal(after_seq=N, max_seq=Y, include_stale=False)"
RJ-->>SSE: journal events N+1 to Y
SSE-->>FE: replay events via SSE
SSE->>SSE: "replay_cutoff_seq = Y"
loop Live stream tail
SC-->>SSE: item from subscriber_queue
SSE->>SSE: "event_seq = parse(item.event_id)"
alt "event_seq <= replay_cutoff_seq"
SSE->>SSE: skip duplicate
else "event_seq > replay_cutoff_seq"
SSE-->>FE: emit SSE event
end
end
Reviews (17): Last reviewed commit: "Close #3401 merge review test gaps" | Re-trigger Greptile
|
Pushed follow-up commits addressing review feedback. Summary of what changed on this branch:
Local verification: Note: this branch will periodically re-conflict on 🤖 Generated with Claude Code |
|
Updated with merge commit 53727cb to bring the branch onto latest master, resolve the CHANGELOG conflict, and remove the duplicate _active_stream_ids import noted in review. Local verification: git diff --check; pytest tests/test_cancelled_turn_status.py tests/test_webui_runtime_diagnostics.py tests/test_stage364_opus_live_sse_event_id.py tests/test_run_journal_routes.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_streaming_static.py tests/test_sprint42.py tests/test_regressions.py -q (179 passed). |
|
Conflict cleanup and CI fallout fixes pushed through What changed:
Verification:
AI assistance: Codex coordinated the merge cleanup with a sub-agent, reviewed the returned diff, fixed the CI-reported merge fallout, and reran focused verification before pushing. |
|
Conflict cleanup pushed in What changed:
Verification:
AI assistance: Codex performed the conflict cleanup and focused regression review. |
|
Conflict cleanup pushed in What changed:
Verification:
AI assistance: Codex Autopilot performed the branch refresh, reviewed final scope, ran focused verification, pushed, and read back GitHub state. |
|
Updated the #3401 Thinking/Worklog blocker. Product model:
Implementation:
Tradeoff:
Verification:
|
…to-final-assistant-replies # Conflicts: # api/streaming.py # static/messages.js # static/ui.js # tests/test_issue765_streaming_persistence.py # tests/test_ui_tool_call_cleanup.py
|
Follow-up after the latest refresh/re-gate:
|
…ds on reconnect, fixes #3707) (#3766) * fix(streaming): replay restored live tool cards on reconnect (#3763, fixes #3707) Post-#3401 (#3400 live-to-final epic) recovery residual. When a running session is restored from its in-memory live-turn snapshot and then reattached to the SSE stream, the restore-success path skipped replaying persisted live tool calls, leaving restored live text/thinking but an EMPTY Worklog until a later SSE event or the final render rebuilt the turn. - Extract the persisted-tool-card replay into replayPersistedLiveToolCards() (reads S.toolCalls or INFLIGHT[sid].toolCalls); run it on restoredLiveTurn && didReconnect, not only the !restoredLiveTurn fallback. - Dedup safety: restore-success replay passes {skipUnkeyedRestoredDuplicates:true} — when the restored snapshot already has .tool-card-row rows, an UNKEYED persisted tool is skipped to avoid a duplicate; keyed cards still replay and appendLiveToolCard's tid-dedup replaces the correct restored row. - appendLiveToolCard() and the new liveToolReplayId() both key on tid||id||tool_call_id||tool_use_id||call_id (consistent 5-alias set), so the dedup covers all known id shapes. - Both replay sites pass {sessionId, streamId} so the ownership guard applies. - Regression coverage: restore-success+reconnect replays tools; unkeyed-restored duplicates skipped; all-id-alias dedup; prior ordering invariants preserved. Correct post-#3401 fix for #3707 (supersedes the closed #3724). Co-authored-by: franksong2702 <[email protected]> * docs(changelog): stamp v0.51.309 — Release JY (stage-a5b #3763) --------- Co-authored-by: nesquena-hermes <[email protected]>
* docs(rfc): add Transparent Stream activity display mode RFC (#3820) Proposes Transparent Stream as an opt-in, chronological activity display mode alongside the default Compact Worklog (#3400/#3401). Captures the display-mode split agreed in #3820: each tool call as a first-class chronological event, interleaved with reasoning/progress, with compact previews, consistent across live, settled, and reload/replay paths. Documents the asymmetry in the existing `simplified_tool_calling` toggle (live-only, no settled/reload branch) and the three concrete integration points so the follow-up can be sliced safely. Doc-only; no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(rfc): refine Transparent Stream rollout scope --------- Co-authored-by: Frank Song <franksong2702@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
/#3876) (#3886) Fixes #3869: empty legacy three-dot thinking spinners piled up as stale rows after the agent finished thinking. The live-to-final redesign (#3401) made the thinking-card-row wrapper class unconditional, which broke finalizeThinkingCard()'s dots-only detection — it treated the wrapper class itself as a "has content" signal, so the dots-only removal branch went dead. Narrow hasContent to the actual .thinking-card element so dots-only spinners are removed on finalize while real Worklog Thinking Cards are preserved. Includes #3869 regression coverage (brace-walks finalizeThinkingCard, asserts the narrowed check + that real thinking cards are not removed). Co-authored-by: nesquena-hermes <[email protected]> Co-authored-by: franksong2702 <franksong2702@users.noreply.github.com>
Fixes #3875: chat transcript rendering as only a stack of date separators with no message bodies. The live-to-final/Worklog redesign (#3401) folds intermediate assistant segments into a collapsed Worklog and hides the source segment; when a turn's ONLY content is folded into a collapsed Worklog (empty final assistant message from an interrupted/autonomous run, or a reload where S.toolCalls did not hydrate so the Worklog has no expandable steps), every segment is hidden and the turn paints blank — leaving a bare column of date dividers. Adds a defensive fail-safe invariant at the end of renderMessages(): a settled assistant turn never renders with zero visible content. Blank turns get their folded Worklog expanded (or hidden segments un-hidden as a last resort). Turns with any visible answer are untouched, preserving the intended collapsed-Worklog UX. Reproduced + verified fixed in an isolated browser (clean Chrome profile to defeat the ?v= asset-cache); RED on master (blank 'Worklog' chip), GREEN with the fix (Worklog expanded, content visible). Includes #3875 structural regression coverage.
… + #3887 + #3831) (#3889) * Release v0.51.342 — Release LF (blank-transcript brick fix #3875) Fixes #3875: chat transcript rendering as only a stack of date separators with no message bodies. The live-to-final/Worklog redesign (#3401) folds intermediate assistant segments into a collapsed Worklog and hides the source segment; when a turn's ONLY content is folded into a collapsed Worklog (empty final assistant message from an interrupted/autonomous run, or a reload where S.toolCalls did not hydrate so the Worklog has no expandable steps), every segment is hidden and the turn paints blank — leaving a bare column of date dividers. Adds a defensive fail-safe invariant at the end of renderMessages(): a settled assistant turn never renders with zero visible content. Blank turns get their folded Worklog expanded (or hidden segments un-hidden as a last resort). Turns with any visible answer are untouched, preserving the intended collapsed-Worklog UX. Reproduced + verified fixed in an isolated browser (clean Chrome profile to defeat the ?v= asset-cache); RED on master (blank 'Worklog' chip), GREEN with the fix (Worklog expanded, content visible). Includes #3875 structural regression coverage. * docs(ui): clarify revealed-flag intent in #3875 fail-safe (greptile P2) Address greptile review on PR #3889: the 'revealed' flag means 'turn has a visible non-empty Worklog group' not 'we just expanded one'. An already-open non-empty group is itself visible, so the last-resort un-hide is correctly skipped. Comment-only; no behavior change. --------- Co-authored-by: nesquena-hermes <[email protected]>
Thinking Path
Refs #3400.
Refs #3014 and supersedes #3015.
What Changed
you may providewordingCompressing contextContext auto-compressedwhile the run continuesCONNECTINGEventSource instancesWhy It Matters
Running agent sessions are where users build trust in Hermes WebUI. The UI should make active work legible without confusing internal lifecycle state for final assistant content. This PR moves the experience closer to mature agent clients such as Codex and Claude Code: progress remains visible while work is happening, lifecycle detail is available when useful, and the final answer remains readable and distinct.
Contract Routing
docs/rfcs/webui-run-state-consistency-contract.md,docs/UIUX-GUIDE.md,DESIGN.md, focused frontend/static tests, run-journal replay tests, and manual 8788 live-session validation.Verification
node --check static/ui.js static/messages.js static/sessions.js static/workspace.js static/panels.js static/i18n.jsgit diff --check origin/masterpython3 scripts/ruff_lint.py --diff origin/masterorigin/masterpython -m pytest -q tests/test_sprint42.py tests/test_auto_compression_card.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_routes.py tests/test_run_journal_frontend_static.py108 passed, 1 warningpython -m pytest tests/ -q --timeout=60 --shard-id=0 --num-shards=32391 passed, 6 skipped, 2 xpassed, 1 warning; one local failure intests/test_profile_skills_stats.py::test_get_profile_skills_statsfrom the macOS platform fixture assumption, unrelated to this PR's diffCompressing contextandContext auto-compressedas centered live dividersScreenshots
Live running state with prose, muted tool rows, and the centered compression divider:
Compression completion while the run continues:
Dark theme live state with prose, quiet tool row, token/timer footer:
Expanded quiet tool rows remain visually subordinate to assistant prose:
Final settled state keeps the folded L1 Worklog above assistant content:
Risks / Follow-ups
Last-Event-IDsupportModel Used
AI-assisted.